A Measure of data-collapse for scaling 
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Data-collapse is a way of establishing scaling and extracting associated exponents in problems 
showing self-similar or self-affine characteristics as e.g. in equilibrium or non-equilibrium phase 
transitions, in critical phases, in dynamics of complex systems and many others. We propose a 
measure to quantify the nature of data collapse. Via a minimization of this measure, the exponents 
and their error-bars can be obtained. The procedure is illustrated by considering finite-size-scaling 
near phase transitions and quite strikingly recovering the exact exponents. 
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Scaling, especially finite size scaling (FSS), has 
emerged as an important framework for understanding 
and analyzing problems involving diverging length scales. 
Such problems abound in condensed matter, high energy 
and nuclear physics, equilibrium and non-equilibrium sit- 
uations, thermal and non-thermal problems, and many 
more. The operational definition of scaling is this: A 
quantity ■m{t, L) depending on two variables, t and L, is 
considered to have scaling if it can be expressed as 



m{t,L)^L'fit/L-). 



(1) 



Depending on the nature of the problem of interest, m 
may refer to magnetization, specific heat, size or some 
other characteristic of a polymer, width of a growing 
or fluctuating surface etc. Eq. ^ is the FSS form if L 
is a linear dimension of the system and t is any other 
variable, could even be time in dynamics. In the ther- 
modynamic limit of infinite-sized systems, such a scal- 
ing would have t and L representing two thermodynamic 
parameters like magnetic field, pressure, chemical poten- 
tial etc or one could be time. If L is a length scale, 
then d would look like the dimension of this quantity m, 
and c of variable t. In fluctuation-dominated cases, it 
is generally a rule, rather than an exception, that d and 
c assume nontrivial values, different from what one ex- 
pects from a dimensional analysis. The exponents and 
the scaling function f{x) then characterize the behavior 
of the system. The fact that two completely independent 
variables (both conceptually and as controlled in experi- 
ments) combine in a nontrivial way to form a single one 
leads to an enormous simplification in the description of 
the phenomenon. This underlies the importance of scal- 
ing. 

A quantitative way of showing scaling is data-collapse 
(also called scaling plot) that goes back to the original 
observation of Rushbrooke that the coexistence curves 
for many simple systems could be made to fall on a sin- 
gle curve For example, the values of m{t,L) (Eq. |^) 
for various t and L can be made to collapse on a single 
curve if mL~'^ is plotted against tL''^. The method of 



data-collapse therefore comes as a powerful means of es- 
tablishing scaling. It is in fact now used extensively to 
analyze and extract exponents especially from numeri- 
cal simulations. Given the importance of scaling in wide 
varieties of problems, it is imperative to have an appro- 
priate measure to determine the "goodness of collapse" - 
not to be left to the eyes of the beholder. 

In this paper, we propose a measure that can be used 
to quantify "collapse" . This measure can be used, via 
a minimization principle, for an automatic search for 
the exponents thereby removing the subjectiveness of 
the approach. To show the power of the method and 
the measure, we use it for two exactly known cases, 
namely the finite-size-scaling of the specific heat for (1) 
the one-dimensional ferro-electric six vertex model ||^ 
showing a first order transition and (2) the Kasteleyn 
dimer model exhibiting the continuous anisotropic 
Pokrovsky-Talapov transition [^j6| . In addition, to show 
the usefulness of the method in case of noisy data as ex- 
pected in any numerical simulation, we consider the one- 
dimensional case with extra Gaussian noise added (by 
hand). It is worth emphasizing that the proposed pro- 
cedure, without any bias, recovered the exactly known 
exponents from the specific heat data for finite systems. 

If the scaling function f{x) of Eq. |^ is known, then 
the sum of residuals 
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J2\L-'m~f{t/L^)\, 



(2) 



where the sum is over all the data points, is minimum for 
the right choice of {d,c). In absence of any statistical or 
systematic error, the minimum value is zero. 

However in most situations the function itself is not 
known but is generally an analytic function. In case of 
a perfect collapse, any one of the sets (say set p) can be 
used for f{x). An interpolation scheme can then be used 
to estimate the values for other sets in the overlapping 
regions . The residuals are then calculated. Since this 
can be done for any set as the basis, we repeat the pro- 
cedure for all sets. Let the tabulated values of m and t 
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be denoted by niij , tij ( ith value of t for the jth set of 
L {i.e.L = Lj for set j )). We now define a quantity Pb, 



Ph = 



EE E 

p j^p i,over 



(3) 



where Sp{x) is the interpolating function based on the 
values of set p bracketing the argument in question (of 
set j). The innermost sum over i is done only for overlap- 
ping points (denoted by the "i, over"), A/'over being the 
number of pairs. Though defined with a general g, we 
use 9 = 1. For £p{x), a 4-point polynomial interpolation 
can be used and if any complex singularity is suspected 
a rational approximation may be used. Extrapolations 
are avoided. The minimum of this function P}, is zero |^] 
and is achievable in the ideal case of perfect collapse with 
correct values of (c?, c), i.e., 



Pb > A labs mill — 



(4) 



This inequality can then be exploited and a minimization 
of Pjy over (d, c) can be used to extract the optimal values 
of the parameters. 

In addition to the values of the exponents, estimates 
of errors can be obtained from the width of the mini- 
mum. A simple approach is to take the quadratic part in 
the individual directions along the (d,c) plane. From an 
expansion of InP^ around the minimum at (fio,co), the 
width is estimated as 



and 
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Ac = ryco 
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(5a) 



(5b) 



for a given rj. Choosing 77 — 1%, the final estimate for 
the exponents would be c?o ± Ac?, co ± Ac with the error 
bar reflecting the width of the minimum at 1% level. 

We now use the proposed method for different test 
cases. In order to implement the program we have 
used the routines of numerical recipes To calculate 
Pfc, POLINT or RATINT has been used for interpolation 
with HUNT to place a point in the table. For minimiza- 
tion, AMOEBA has been used thrice to locate the mini- 
mum, each time using the current estimates to generate 
a new triangle enclosing the minimum. In the examples 
given below, there was no need for more sophisticated 
minimization routines, which could be needed in case of 
subtle crossover behaviors or with nearby minima. 

Let us first consider the one-dimensional six-vertex 
model which shows a first-order transition 0. With 



the partition function Z = 2 + {2x)^ for N sites with 
X = eKp(—e/kBT) as the Boltzmann factor, e being the 
energy of the high-energy vertices, T the temperature and 
ks the Boltzmann factor, the specific heat can be com- 
puted exactly. The first-order transition is at a: = 1/2, 
for N — > 00, with a i5-function jump in specific heat. The 
TV-dependent specific heat (per site), cat, is given by S 



CM = kB{\TLxf2N 



{2x) 



N 



[2 + (2a;) 
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(6) 



which for large N and small t 
form of Eq. |l| with c? = 1, c = - 

/(z) = fcs(ln2)22 



= 1 — 2a; has the scaling 
1 and 



(2 + e 



z\2 ■ 



(7) 



From the exact formula, Eq. ^, data were generated 
for N = 10, 30, 50, 70 and 90, for various values of tem- 
peratures. A minimization of Pf, gave us the estimate 
d = .997 ± 0.04, c = .98 ± .06, with Pf, = 0.56881S - 01. 
The exponents are very close to the exact ones. The 
error-bars or the width of the minimum is to be inter- 
preted as an indication of the presence of non-scaling cor- 
rections. To test this, we have generated data from the 
exact scaling function of Eq. ^ An unbiased minimiza- 
tion of Pb then gave d=l± 0.004, c = -1 ± 0.004 with 
Pf, = 0.34876E' — 03. The smallness of the residue and 
of the errors (or the width of the minimum) represents a 
good data collapse. The nature of the data-collapse for 
both the cases is shown in Fig. 1. 
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FIG. 1. The collapse of the specific heat for the 1-d vertex 
model as calculated from Eq. § [Fig. (a)] and from Eq. | [Fig. 
(b)]. The X-axis in both the plots is | t | /N'^ . The upper 
(lower) branch is for t > Q (t <Q). For both, the exact values 
d — l,c — —1 are used. 

A similar minimization of Pf, was carried out for the 
two-dimensional Kasteleyn dimer model ( also isomor- 
phic to a two-dimensional 5 vertex model) . This is an ex- 
actly solvable lattice model of the continuous anisotropic 
Pokrovsky-Talapov transition for surfaces, and shows a 
square-root singularity for specific heat with different cor- 
relations lengths in the two directions |p|,p^. The spe- 
cific heat for lattices of size M along the direction of the 
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"walls" and infinite in the transverse direction is known 
exactly and its finite size scaling form has been discussed 
in Ref ||^. Using the following formula for the specific 
heat per site cm 



Cm 



M 



2ir 



{2x COS < 



[l + (22;cos0)A'^]2 



d(j), 



(8) 



FIG. 3. Noisy data. Plots of Pf, against A. Inset shows the 
estimated value of d as a function of A. 



specific heats data were generated for M = 10, 30, 50, 70 
and 90. In this formula, a few unimportant factors 
are put under a and not explicitly shown. The criti- 
cal point is at a; = 1/2. A minimization of Pb gave 
d = .5 ± 0.03, c = -.945 ± .02 to be compared with 
the exact values d — .5,c = —1. The residue factor is 
Pb = 0.12424i? — 01. The importance of correction terms 
are clear from Fig. 5 of Ref. and in our approach it 
gets reflected in the not too small value of Pb- 

The function Pb for q = 1 for the above two- 
dimensional problem is shown as a surface plot over the 
(d, c) plane in Fig. 1 . The sharpness of the minimum is 
note-worthy. In both the examples considered, the per- 
formance of the method is remarkable. 




FIG. 2. The residue Pb with q = 1 is shown over the (d, c) 
plane. The «-axis is in log scale. A few contours of constant 
ln(Pi,) are shown in different colors on the (d, c) plane. 



The last example we consider is the set of noisy data 
PI where c is calculated from Eq. ^ and Gaussian noise 
was added to it so that cat^^ = |cAr {1 + Ar])\ where rj is 
a Gaussian deviate and A is the amplitude of the noise 
added. The absolute value is taken to keep c„ positive. 
The values of the exponents are found to be insensitive 
to A for A < 0.1 and starts changing for higher values of 
A. In Fig. I we show Pb against A. The larger values of 
Pb for larger A is a sign of poor collapse, as one finds by 
direct plotting with the estimated values or exact values. 



To summarize, we have proposed a measure to quan- 
tify the nature of data collapse in any scaling analysis of 
the form given by Eq. (|^). This measure can be used for 
an automated search for the exponents. The method is 
quite general and even-though we formulated it in terms 
of power-laws as in Eq. 1, it can very easily be adopted 
to other forms of scaling ||l^ . We conclude that the sub- 
jectiveness of data-collapse can be removed and Pb could 
be used as a quantitative measure to test or compare 
"goodness of collapse" in any scaling analysis. 
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